| Non-Rationalised Geography NCERT Notes, Solutions and Extra Q & A (Class 6th to 12th) | |||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6th | 7th | 8th | 9th | 10th | 11th | 12th | |||||||||||||||||||||
Chapter 1 Data–Its Source and Compilation
In our daily lives and in various forms of media like television news or geographical books, we frequently encounter numerical information. This information represents measurements or counts from the real world and is known as data.
A single measurement is called a datum.
For example, figures like rainfall amounts (20 cm, 35 cm) or distances between cities (1385 km, 1542 km) are all considered data.
While there is an immense amount of data available today, deriving meaningful insights or conclusions from it can be challenging if it remains in its unprocessed, original form (raw data). Therefore, it is important that measured information is derived, deduced, or calculated logically and/or statistically from multiple data points to become useful.
When data is organized and processed to provide a meaningful answer to a question or to stimulate further inquiry, it becomes information.
Need of Data
In geography, maps are fundamental tools for understanding spatial distributions. However, data presented in tables or used for statistical analysis is equally important for explaining phenomena like population growth, distribution patterns, or the flow of goods.
Geographical phenomena often interact with each other across the Earth's surface. These interactions are influenced by many factors or variables. To understand these relationships precisely, especially their quantitative aspects, analyzing relevant data using statistical methods has become essential.
For instance, to study agricultural patterns in an area, one needs quantitative data on factors like the extent of cropped land, crop yields, total production, irrigated area, rainfall amounts, and inputs used (fertilizers, pesticides).
Similarly, analyzing the growth and characteristics of a city requires data on its total population, population density, migration figures, occupational structure, income levels, industries, and transport/communication infrastructure.
Thus, data is indispensable for conducting thorough geographical analysis.
Presentation of the Data
Simply collecting data is not enough; how it is presented and analyzed is equally crucial. Misinterpreting raw data or relying solely on simple averages without considering the distribution can lead to misleading conclusions (a statistical fallacy), potentially deviating from the actual situation.
Statistical methods are widely used today in almost all fields, including geography, for analyzing, presenting, and interpreting data to draw sound conclusions.
Quantitative analysis is increasingly preferred over purely qualitative descriptions to explain relationships between geographical variables. This shift necessitates the use of analytical tools and techniques for collecting, compiling, tabulating, organizing, and analyzing data to arrive at logical and precise findings.
Sources of Data
Data can be obtained from different origins, which are broadly categorized into two main types:
1. Primary Sources: Data collected for the very first time by the individual or organization conducting the research.
2. Secondary Sources: Data that has already been collected, processed, or published by another individual or organization, and is being used by a different researcher.
Fig. 1.1 (refer to diagram in text) illustrates the various methods used for collecting both primary and secondary data.
Sources of Primary Data
Primary data is collected directly from the source using several methods:
1. Personal Observations: This involves collecting information by directly observing phenomena in the field. Through a field survey, one can gather data on physical features (relief, drainage, soil types, vegetation), demographic characteristics (population structure, sex ratio, literacy), infrastructure (transport, communication), and settlement patterns (rural, urban). This method requires the observer to have relevant theoretical knowledge and an objective, unbiased approach.
2. Interview: The researcher obtains information directly from individuals (respondents) through verbal interaction, conversation, or dialogue. Key considerations for conducting effective interviews include preparing a clear list of questions, having a clear objective, building rapport with respondents, ensuring privacy for sensitive information, using simple and polite language, avoiding offensive questions, and asking for additional information.
3. Questionnaire/Schedule: These involve a set of written questions used to collect data. A questionnaire is filled out by the respondent themselves, often by selecting from pre-provided answers or writing brief responses. It is useful for covering large areas and can be mailed. A limitation is that it is only suitable for literate respondents. A schedule is similar but filled out by a trained enumerator who asks the questions verbally to the respondent. This method allows data collection from both literate and illiterate individuals.
4. Other Methods: Direct measurement using specialized tools can also be a source of primary data. For example, collecting data on soil or water properties using testing kits, or measuring crop health using transducers.
Secondary Source of Data
Secondary data is obtained from existing records or publications. These can be published or unpublished.
Published Sources:
- 1. Government Publications: Various ministries and departments of the Central and State governments, along with District Bulletins, publish reports and data. Important examples include the Census of India (Registrar General of India), National Sample Survey reports, Weather Reports (Indian Meteorological Department), State Statistical Abstracts, and reports from various Commissions.
- 2. Semi/Quasi-government Publications: Publications and reports from local government bodies such as Urban Development Authorities, Municipal Corporations, and Zila Parishads (District Councils).
- 3. International Publications: Yearbooks, reports, and monographs published by agencies of the United Nations (UNESCO, UNDP, WHO, FAO) and other international organizations. Periodically published UN reports include the Demographic Year Book, Statistical Year Book, and the Human Development Report.
- 4. Private Publications: Yearbooks, surveys, research reports, and monographs produced by private companies, research institutions, and organizations.
- 5. Newspapers and Magazines: Daily newspapers and various periodicals serve as accessible sources of recent secondary data on a wide range of topics.
- 6. Electronic Media: The internet is a increasingly significant source of secondary data, providing access to a vast amount of information from various sources worldwide.
Unpublished Sources:
- 1. Government Documents: Reports, monographs, and records that are not formally published but maintained as official records at different administrative levels (e.g., village revenue records kept by a patwari).
- 2. Quasi-government Records: Periodical reports, development plans, and other documents maintained by Municipal Corporations, District Councils, and civil service departments.
- 3. Private Documents: Unpublished reports and records of companies, trade unions, political organizations, residents' associations, etc.
Tabulation and Classification of Data
Raw data, whether from primary or secondary sources, is initially unorganized and difficult to comprehend. To make it usable and derive meaningful inferences, it needs to be processed through tabulation and classification.
A Statistical Table is a simple and effective way to summarize and present data. It involves arranging data systematically in rows and columns. The purpose is to simplify presentation, facilitate comparisons between different data points, and allow readers to quickly find specific information.
Tables enable analysts to organize large volumes of data in a structured manner within a limited space.
Data Compilation and Presentation
Data is typically collected, organized, and presented in tables in different formats:
Absolute Data
When data is presented in its original, unprocessed numerical form (as whole numbers or integers), it is called absolute data or raw data. Examples include the total population count of a country or state, or the total production volume of a crop or industry.
Table 1.1 shows the absolute population figures for India and selected states/UTs based on the 2011 Census.
| State/UT Code | India/State/Union Territory | Persons | Males | Females |
|---|---|---|---|---|
| Total Population | ||||
| INDIA | 1,21,05,69,573 | 62,31,21,843 | 58,74,47,730 | |
| 1 | Jammu and Kashmir | 1,25,41,302 | 66,40,662 | 59,00,640 |
| 2 | Himachal Pradesh | 68,64,602 | 34,81,873 | 33,82,729 |
| 3 | Punjab | 2,77,43,338 | 1,46,39,465 | 1,31,03,873 |
| 4 | Chandigarh | 10,55,450 | 5,80,663 | 4,74,787 |
| 5 | Uttarakhand | 1,00,86,292 | 51,37,773 | 49,48,519 |
| 6 | Haryana | 2,53,51,462 | 1,34,94,734 | 1,18,56,728 |
| 7 | National Capital Territory of Delhi | 1,67,87,941 | 89,87,326 | 78,00,615 |
| 8 | Rajasthan | 6,85,48,437 | 3,55,50,997 | 3,29,97,440 |
| 9 | Uttar Pradesh | 19,98,12,341 | 10,44,80,510 | 9,53,31,831 |
| 10 | Bihar | 10,40,99,452 | 5,42,78,157 | 4,98,21,295 |
Percentage/Ratio
Data can also be presented as percentages or ratios, which are calculated based on a common parameter. This format is useful for comparison and analysis of trends or proportions.
Examples include calculating literacy rates, population growth rates, or the percentage share of different sectors in agricultural or industrial production.
Table 1.2 shows the literacy rates in India over several decades, presented as percentages. The literacy rate is calculated using the formula:
$ \text{Literacy Rate} = \frac{\text{Total Number of Literates}}{\text{Total Population}} \times 100 $
| Year | Person (%) | Male (%) | Female (%) |
|---|---|---|---|
| 1951 | 18.33 | 27.16 | 8.86 |
| 1961 | 28.3 | 40.4 | 15.35 |
| 1971 | 34.45 | 45.96 | 21.97 |
| 1981 | 43.57 | 56.38 | 29.76 |
| 1991 | 52.21 | 64.13 | 39.29 |
| 2001 | 64.84 | 75.85 | 54.16 |
| 2011 | 73.0 | 80.9 | 64.6 |
Index Number
An index number is a statistical tool used to show changes in a variable or a group of related variables relative to a base period or location. It measures relative changes, not absolute ones.
Index numbers are widely used, particularly in economics and business (e.g., tracking price changes with the Consumer Price Index), but can also compare conditions across different places or industries.
A common method for calculation is the simple aggregate method. It is calculated as:
$ \text{Index Number} = \frac{\sum q_1}{\sum q_0} \times 100 $
Where:
- $\sum q_1$ = Total value (e.g., production) in the current period
- $\sum q_0$ = Total value (e.g., production) in the base period
The base period value is typically set to 100, and the index number for other periods is calculated relative to this base.
Table 1.3 illustrates the production of iron ore in India and the calculation of an index number, taking 1970-71 as the base year (Index = 100).
| Year | Production (in million tonnes) | Calculation | Index Number (Base 1970-71=100) |
|---|---|---|---|
| 1970-71 | 32.5 | $\frac{32.5}{32.5} \times 100$ | 100 |
| 1980-81 | 42.2 | $\frac{42.2}{32.5} \times 100$ | 130 |
| 1990-91 | 53.7 | $\frac{53.7}{32.5} \times 100$ | 165 |
| 2000-01 | 67.4 | $\frac{67.4}{32.5} \times 100$ | 207 |
Processing of Data
Once collected, raw data needs to be processed to be understood and analyzed. This involves tabulating the data and classifying it into meaningful categories or groups.
For instance, if you have a list of individual scores for 60 students (like in Table 1.4 in the text), this raw data is difficult to interpret directly.
The first step in processing such ungrouped raw data is to group it into classes. This reduces the volume of data and makes it easier to identify patterns and summarize information.
Grouping of Data
Grouping data involves deciding the number of classes or groups to create and the range of values within each class (the class interval). The choice depends on the overall range of the raw data (the difference between the highest and lowest values).
For example, if scores range from 02 to 96, you could decide to create 10 classes with an interval of 10 units each (e.g., 0-10, 10-20, 20-30, ..., 90-100).
Process of Classification
After determining the classes and intervals, the raw data is classified by assigning each individual observation to the appropriate class. A common method for this is the Four and Cross Method (or tally marks).
For each data point, a tally mark is placed in the corresponding class. Tally marks are grouped in fives (four vertical lines crossed by a diagonal line) for easy counting. For example, if a score is 47, a tally mark is added to the 40-50 class.
Frequency Distribution
Once the data is classified into groups using tally marks, the total count of tally marks in each group gives the number of individuals (or observations) falling into that class. This count is called the frequency for that class.
A table showing the classes and their corresponding frequencies is called a frequency distribution. It illustrates how the values of a variable are distributed across different ranges.
Frequencies are presented as either Simple Frequencies or Cumulative Frequencies.
Simple Frequencies: Represented by 'f', this is the count of observations strictly within each specific class or group (as obtained from the tally marks). The sum of all simple frequencies ($\sum f$) equals the total number of observations (N).
Cumulative Frequencies: Represented by 'Cf', these are obtained by successively adding the simple frequencies. The cumulative frequency for a class is the sum of its simple frequency and the cumulative frequency of the preceding class. The cumulative frequency for the last class should equal the total number of observations (N).
Cumulative frequencies help in quickly determining the number of observations below or above a certain value.
When forming classes, especially for quantitative data, two common methods are used:
Exclusive Method
In this method, the upper limit of a class is the same as the lower limit of the next class (e.g., 0-10, 10-20, 20-30). However, observations equal to the upper limit are *excluded* from that class and included in the *next* class where they are the lower limit. For example, a value of 30 would be included in the 30-40 class, not the 20-30 class. This ensures that each observation falls into only one class.
Table 1.6 shows frequency distribution using the exclusive method.
| Group | f | Cf |
|---|---|---|
| 00-10 | 4 | 4 |
| 10-20 | 5 | 9 |
| 20-30 | 5 | 14 |
| 30-40 | 7 | 21 |
| 40-50 | 6 | 27 |
| 50-60 | 10 | 37 |
| 60-70 | 8 | 45 |
| 70-80 | 6 | 51 |
| 80-90 | 5 | 56 |
| 90-100 | 4 | 60 |
| $\sum f$ | N = 60 |
Inclusive Method
In this method, the upper limit of a class is *included* within that same class (e.g., 0-9, 10-19, 20-29). The upper limit of one class is usually one less than the lower limit of the next class. This method ensures that each observation falls into only one class and that both the lower and upper boundaries define the values included.
Table 1.7 shows frequency distribution using the inclusive method.
| Group | f | Cf |
|---|---|---|
| 0 – 9 | 4 | 4 |
| 10 – 19 | 5 | 9 |
| 20 – 29 | 5 | 14 |
| 30 – 39 | 7 | 21 |
| 40 – 49 | 6 | 27 |
| 50 – 59 | 10 | 37 |
| 60 – 69 | 8 | 45 |
| 70 – 79 | 6 | 51 |
| 80 – 89 | 5 | 56 |
| 90 – 99 | 4 | 60 |
| $\sum f$ | N = 60 |
Frequency Polygon
A frequency polygon is a line graph that visually represents a frequency distribution. It is created by plotting points at the midpoints of each class interval on the x-axis and their corresponding frequencies on the y-axis, and then connecting these points with straight lines. It is useful for visualizing the shape of a distribution and comparing multiple distributions.
Ogive
An Ogive (pronounced 'ojive') is a graphical representation of a cumulative frequency distribution. It shows the cumulative frequency plotted against the upper or lower boundaries of the class intervals.
Ogives are constructed using either the 'less than' or the 'more than' method.
- Less than Ogive: Constructed by plotting cumulative frequencies against the upper limits of the classes. It is an ascending curve.
- More than Ogive: Constructed by plotting cumulative frequencies (starting from the total frequency) against the lower limits of the classes. It is a descending curve.
Table 1.8 shows cumulative frequencies using the less than method.
| Less than Method | Cf |
|---|---|
| Less than 10 | 4 |
| Less than 20 | 9 |
| Less than 30 | 14 |
| Less than 40 | 21 |
| Less than 50 | 27 |
| Less than 60 | 37 |
| Less than 70 | 45 |
| Less than 80 | 51 |
| Less than 90 | 56 |
| Less than 100 | 60 |
Table 1.9 shows cumulative frequencies using the more than method.
| More than Method | Cf |
|---|---|
| More than 0 | 60 |
| More than 10 | 56 |
| More than 20 | 51 |
| More than 30 | 44 |
| More than 40 | 38 |
| More than 50 | 28 |
| More than 60 | 20 |
| More than 70 | 14 |
| More than 80 | 9 |
| More than 90 | 4 |
Both the 'less than' and 'more than' ogives can be plotted on the same graph (Fig. 1.8). The intersection point of the two ogives represents the median of the distribution.
Table 1.10 combines the data for both less than and more than methods for plotting a comparative ogive.
| Marks obtained | Less than | More than |
|---|---|---|
| 0 - 10 | 4 | 60 |
| 10 - 20 | 9 | 56 |
| 20 - 30 | 14 | 51 |
| 30 - 40 | 21 | 44 |
| 30 - 40 | 27 | 38 |
| 50 - 60 | 37 | 28 |
| 60 - 70 | 45 | 20 |
| 70 - 80 | 51 | 14 |
| 80 - 90 | 56 | 9 |
| 90 - 100 | 60 | 4 |
Excercises
This section contains questions and activities designed to reinforce understanding of the concepts of data, its sources, compilation, processing, and presentation discussed in the chapter.